We present a new method which provides object location priors for previously unseen object 6D pose estimation. Existing approaches build upon a template matching strategy and convolve a set of reference images with the query. Unfortunately, their performance is affected by the object scale mismatches between the references and the query. To address this issue, we present a finer-grained correlation estimation module, which handles the object scale mismatches by computing correlations with adjustable receptive fields. We also propose to decouple the correlations into scale-robust and scale-aware representations to estimate the object location and size, respectively. Our method achieves state-of-the-art unseen object localization and 6D pose estimation results on LINEMOD and GenMOP. We further construct a challenging synthetic dataset, where the results highlight the better robustness of our method to varying backgrounds, illuminations, and object sizes, as well as to the reference-query domain gap.
translated by 谷歌翻译
Knowledge distillation facilitates the training of a compact student network by using a deep teacher one. While this has achieved great success in many tasks, it remains completely unstudied for image-based 6D object pose estimation. In this work, we introduce the first knowledge distillation method driven by the 6D pose estimation task. To this end, we observe that most modern 6D pose estimation frameworks output local predictions, such as sparse 2D keypoints or dense representations, and that the compact student network typically struggles to predict such local quantities precisely. Therefore, instead of imposing prediction-to-prediction supervision from the teacher to the student, we propose to distill the teacher's \emph{distribution} of local predictions into the student network, facilitating its training. Our experiments on several benchmarks show that our distillation method yields state-of-the-art results with different compact student models and for both keypoint-based and dense prediction-based architectures.
translated by 谷歌翻译
最新的6D对象构成估计方法,包括无监督的方法,需要许多真实的训练图像。不幸的是,对于某些应用,例如在空间或深水下的应用程序,几乎是不可能获取真实图像的,即使是未注释的。在本文中,我们提出了一种可以仅在合成图像上训练的方法,也可以选择使用一些其他真实图像。鉴于从第一个网络获得的粗糙姿势估计,它使用第二个网络来预测使用粗糙姿势和真实图像呈现的图像之间的密集2D对应场,并渗透了所需的姿势校正。与最新方法相比,这种方法对合成图像和真实图像之间的域变化敏感得多。它与需要注释的真实图像进行训练时的方法表现出色,并且在使用二十个真实图像的情况下,它们的表现要优于它们。
translated by 谷歌翻译
在本文中,我们解决了从单眼图像估算以前未见对象的3D方向的任务。该任务与大多数现有深度学习方法所考虑的任务形成对比,后者通常假设在训练过程中观察到测试对象。为了处理看不见的对象,我们遵循基于检索的策略,并通过计算查询图像和合成生成的参考图像之间的多尺度局部相似性来防止网络学习特定于对象的特征。然后,我们引入了一个自适应融合模块,该模块可稳健地将局部相似性汇总到成对图像的全局相似性评分中。此外,我们通过制定快速检索策略来加快检索过程。我们在LineMod,LineMod-Ocluded和T-less数据集上进行的实验表明,与以前的工作相比,我们的方法对看不见的对象产生了明显的概括。我们的代码和预训练模型可在https://sailor-z.github.io/projects/unseen_object_pose.html上找到。
translated by 谷歌翻译
扩散降级概率模型(DDPM)和视觉变压器(VIT)分别在生成任务和判别任务中表现出重大进展,到目前为止,这些模型已在其自身领域中很大程度上开发出来。在本文中,我们通过将VIT结构集成到DDPM之间,建立DDPM和VIT之间的直接联系,并引入一种称为“生成Vit(Genvit)”的新生成模型。VIT的建模灵活性使我们能够将Genvit进一步扩展到混合判别生成建模,并引入混合VIT(HYBVIT)。我们的工作是最早探索单个VIT以共同探索图像生成和分类的人之一。我们进行了一系列实验,以分析提出的模型的性能,并证明它们在生成和判别任务中都超过了先前的最新技术。我们的代码和预培训模型可以在https://github.com/sndnyang/diffusion_vit中找到。
translated by 谷歌翻译
张量恢复是计算机视觉和机器学习中的重要问题。它通常使用张量排名的凸松弛和$ l_ {0} $ norm,即分别为核定标准和$ l_ {1} $ norm,以解决此类问题。已知凸的近似值会产生偏置的估计量。为了克服这个问题,采用并设计了相应的非凸照器。受到最近开发的矩阵等效最小值凸额(EMCP)定理的启发,本文确定了张量当量的最小值 - concave惩罚(TEMCP)的定理。张量当量MCP(TEMCP)作为非凸照正规器零件和等效加权张量$ \ gamma $ norm(EWTGN)作为低级别部分的构建,两者都可以实现权重适应性。同时,我们提出了两个相应的自适应模型,用于两个经典的张量恢复问题,即低级张量完成(LRTC)和张量鲁棒的主成分分析(TRPCA),其中优化算法基于交替的方向乘数(ADMM)。设计了这种新型的迭代自适应算法,可以产生更准确的张量恢复效果。对于张量的完成模型,考虑了多光谱图像(MSI),磁共振成像(MRI)和彩色视频(CV)数据,而对于张量的稳定性主成分分析模型,高光谱图像(HSI)在高斯噪声和盐和盐和盐和盐和盐和盐和盐和盐和盐和考虑了胡椒噪声。所提出的算法优于ARTS方法,并且通过实验保证其降低和收敛性。
translated by 谷歌翻译
张量稀疏建模是一种有希望的方法,在整个科学和工程学中,取得了巨大的成功。众所周知,实际应用中的各种数据通常由多种因素产生,因此使用张量表示包含多个因素内部结构的数据。但是,与矩阵情况不同,构建合理的稀疏度量张量是一项相对困难且非常重要的任务。因此,在本文中,我们提出了一种称为张量全功能度量(FFM)的新张量稀疏度度量。它可以同时描述张量的每个维度的特征信息以及两个维度之间的相关特征,并将塔克等级与张量管等级连接。这种测量方法可以更全面地描述张量的稀疏特征。在此基础上,我们建立了其非凸放松,并将FFM应用于低级张量完成(LRTC)和张量鲁棒的主成分分析(TRPCA)。提出了基于FFM的LRTC和TRPCA模型,并开发了两种有效的交替方向乘数法(ADMM)算法来求解所提出的模型。各种实际数值实验证实了超出最先进的方法的优势。
translated by 谷歌翻译
低等级张量完成(LRTC)问题引起了计算机视觉和信号处理的极大关注。如何获得高质量的图像恢复效果仍然是目前要解决的紧急任务。本文提出了一种新的张量$ l_ {2,1} $最小化模型(TLNM),该模型(TLNM)集成了总和核标准(SNN)方法,与经典的张量核定常(TNN)基于张量的张量完成方法不同,与$ L_ { 2,1} $ norm和卡塔尔里亚尔分解用于解决LRTC问题。为了提高图像的局部先验信息的利用率,引入了总变化(TV)正则化项,从而导致一类新的Tensor $ L_ {2,1} $ NORM Minimization,总变量模型(TLNMTV)。两个提出的模型都是凸,因此具有全局最佳解决方案。此外,我们采用交替的方向乘数法(ADMM)来获得每个变量的封闭形式解,从而确保算法的可行性。数值实验表明,这两种提出的算法是收敛性的,比较优于方法。特别是,当高光谱图像的采样率为2.5 \%时,我们的方法显着优于对比方法。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译